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Abstract. The paper is devoted to the study of a parametric deformation model 
of independent and identically random variables. Firstly, we construct an efficient 
and very easy to compute recursive estimate of the parameter. Our stochastic 
estimator is similar to the Robbins-Monro procedure where the contrast function 
is the Wasserstein distance. Secondly, we propose a recursive estimator similar 
to that of Parzen-Rosenblatt kernel density estimator in order to estimate the 
density of the random variables. This estimate takes into account the previous 
estimation of the parameter of the model. Finally, we illustrate the performance 
of our estimation procedure on simulations for the Box-Cox transformation and 
the arcsinh transformation. 



1. INTRODUCTION 

In many situations, random variables are not directly observed but only their 
image by a deformation is available. Hence, finding the mean behaviour of a data 
sample becomes a difficult task since the usual notion of Euclidean mean is too 
rough when the information conveyed by the data possesses an inner geometry far 
from the Euclidean one. Indeed, deformations on the data such as translations, scale 
location models for instance or more general warping procedures prevent the use of 
the usual methods in data analysis. 

On the one hand, the deformations may result from some variations which are 
not directly correlated to the studied phenomenon. This situation occurs often in 
biology for example when considering gene expression data obtained from microarray 
technologies to measure genome wide expression levels of genes in a given organism 
as described in [2j. A natural way to handle this phenomena is to remove these 
variations in order to align the measured densities. However, it is quite difficult to 
implement since the densities are unknown. In bioinformatics and computational 
biology, a method to reduce this kind of variability is known as normalization (see 
[8j and references therein). 

In epidemiology, removing variations is important in medical studies, where one 
observes age-at-death of several cohorts. Indeed, the individuals or animals members 
of the cohort enjoy different life conditions which means that time- variation is likely 
to exist between the cohort densities and hazard rates due the effects of the different 
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biotopes on aging. Synchronization of the different observations is thus a crucial 
point before any statistical study of the data. 

On the other hand, the variations on the observations are often due to transfor- 
mations that have been conducted by the statisticians themselves. In econometric 
science, transformations have been used to aid interpretability as well as to im- 
prove statistical performance of some indicators. An important contribution to this 
methodology was made by Box and Cox in [3] who proposed a parametric power 
family of transformations that nested the logarithm and the level. Estimation in 
this framework is achieved in |16] . 

In this work, we concentrate on the case where the data and their transformation 
are observed in a sequence model defined, for all n > 0, by 

(1-1) = (feiSn) 

where, for all t G M, the family of parametric functions {(ft) is known and 
is a sequence of independent and identically distributed random variables. Our 
main goal is to estimate recursively the unknown parameter 6 by privilegiating an 
alignment in distribution. More precisely, our approach to estimate 6 is associated 
with a stochastic recursive algorithm similar to that of Robbins-Monro described in 
[IH] and L20J. 

Assume that one can find a function (p (called contrast function) free of the param- 
eter 6, such that (f){6) = 0. Then, it is possible to estimate 6 by the Robbins-Monro 
algorithm 

(1-2) 9n+l = On + 7„T„+i 

where (7^) is a positive sequence of real numbers decreasing towards zero and (T„) 
is a sequence of random variables such that E[T„_|_i| = 0(6'„) where stands 
for the (T-algebra of the events occurring up to time n. Under standard conditions 
on the function and on the sequence (7n), it is well-known (see in [7] and |13) ) 
that 6n tends to 6 almost surely. The asymptotic normality of 6n together with the 
quadratic strong law may also be found in [12]. A randomly truncated version of 
the Robbins-Monro algorithm is also given in fJ], \T^, whereas we can find in |lj an 
application of the Robbins-Monro algorithm in semiparametric regression models. 
In our framework, if we assume that ipt is inversible, then one can consider 

z„(t) = (X„) . 

Hence, a natural registration criterion is to minimize with respect to t the quadratic 
distance between Zn{t) and En 

M{t)=E[\Z4t)-en\^]. 

It is then obvious that the parameter ^ is a global minimum of M and one can 
implement a Robbins-Monro procedure for the contrast function M', which is the 
differential of the function M. 

The second part of the paper concerns the estimation of the density / of the ran- 
dom variables (e„). More precisely, we focus our attention on the Parzen- Rosenblatt 
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estimator of / described for instance in [18j or [21j . Under reasonable conditions 
on the function /, Parzen established in [18J the pointwise convergence in proba- 
bility and the asymptotic normality of the estimator without the parameter 9. In 
[22], Silverman obtained uniform consistency properties of the estimator. Moreover, 
important contributions on the L^-integrated risk has been obtained by Devroye in 
[6] whereas Hall has studied in and [Hj the L^-integrated risk. In our situation, 
we propose to make use of a recursive Parzen-Rosenblatt estimator of / which takes 
into account the previous estimation of the parameter 9. It is given, for all x G M, 
by 



1 " 

(1.3) fn{x) = -y^Wi{x) 

with 



n 



WAx) = ^k''' 



hi \ h 



where the kernel i^T is a chosen probability density function and the bandwidth {h.j) 
is a sequence of positive real numbers decreasing to zero. The main difficulty arising 
here is that we have to deal with the term inside the kernel K. 

The paper falls into the following parts. Section 2 is devoted to the description 
of the model. Section 3 deals with the parametric estimation of 9. We establish 
the almost sure convergence of 9n as well as its asymptotic normality. In Section 
4, under standard regularity assumptions on the kernel we prove the almost 
sure pointwise and quadratic convergences of fn{x) to f{x). Section 5 contains 
some numerical experiments on the well known Box-Cox transformation and on the 
arsinh transformation illustrating the performances of our parametric estimation 
procedure. The proofs of the parametric results are given is Section 6, while those 
concerning the nonparametric results are postponed to Section 7. 



2. DESCRIPTION OF THE MODEL AND THE CRITERION 
Suppose that we observe independent and identically distributed random variables 



En and a deformation X„ of Sn according to the model (1.1) defined, for all n > 0, 
by 

where ^ G C M. Throughout the paper, we denote by e and X random variables 
sharing the same distribution as and X„, respectively. 

Assume that for all t G M, the family of parametric functions {(ft) is known but 
that the parameter 9 is unknown. This situation corresponds to the case where the 
warping operator can be modeled by a parametric shape. Estimating the parameter 
is the key to understand the amount of deformation in the chosen deformation 
class. This model has been widely used in the regression case, see for instance in 
[9j. Assume also that for all t G M, (/^t is invertible on an interval which will be 
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made precise in the next section. Then, one can consider the random variable Zn(t) 
defined as 

(2.1) Z^{t) = y,;\X^) = y,;' i^eisn)) . 

We also denote by Z{t) a random variable sharing the same distribution as Zn{t). 
In order to estimate 6, we choose to evaluate the distance between e and Z{t) 
which is given by 

(2.2) M{t) =E[\Z{t) -ef]. 

Denoting the quantile function associated with e, it can be rewritten as 

M{t) = E\\if;'{ife{e))-e\^]= [ {f;'o^goF-\x)-F-\x)Ydx. 
L ^ Jo 

Indeed (see for instance in [23] p. 305) it is well-known that if F is a random variable 

with distribution function G, then for U ~ W[0;i], ^ ~ {U). 

Moreover, if we assume that for all t, (ft is increasing, then one have the following 

expression for the quantile function associated with Z{t): -^^(j) = ° 'fe ° 

and so 



M{t) = {f~1^{x) - F-\x)f dx. 



This quantity corresponds to the Wasserstein distance between the laws of Z{t) 
and e, defined and studied for instance in |5] in general case. Using Wasserstein 
metrics to align distributions is rather natural since it corresponds to the trans- 
portation cost between two probability laws. It is also a proper criterion to study 
similarities between point distributions (see for instance in [17j ) which is already 
used for density registration in jT3] or jH] in a non sequential way. 
Hence, in this setting, considering the distance between the starting point and 
the registered point is equivalent to investigate the Wasserstein distance between 
their laws. 



As M[9) = and the function M defined by (2.2) is non-negative, it is clear that 



M admits at least a global minimum at 6 which permits to have a characterization 
of the parameter of interest. 

3. ESTIMATION OF THE PARAMETER 6 

In this section, we focus our attention on the estimation of the parameter ^ G O 
where G is supposed to be an interval of M. Before implementing the estimation 



procedure for 9, several hypothesis on the model (1.1) are required. 



(Al) For all t G 0, (pt is invertible, increasing from Ji to I2, some subsets of 



(A2) For all x G I2, ^(^) continuously differentiable with respect to t G 0. 
Its derivative is denoted by d<ft^{x). 
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(A3) 



For all t eQ, (f^^ o (^g e L"^ (e) . 



(A4) 



For all compact B in Q, E 



sup IdiPi-'^ o (fg (e) 

t&B 



< +00. 



From assumption (Al), the distribution function of X is Fx = F 



-1 



whereas 



that of Z{t) is F o o ip^. 



Lemma 3.1. Assume (Al) to (A4). Then M is continuously dijferentiable on Q. 



Using Lemma 3J^, the differential M' of M has the following expression for all 

t E e, 

M'{t) = - 2 / o^go F-\x) {F-\x) - o^go F'^x)) dx 

Jo 

(3.1) =-2E[dif;'iX){e-ip;'iX))]. 

It is then clear that M'{6) = 0. Then, we can assume that there exists {a, b} E 
with a < b and 6 E]a;b[ G Q such that, for all t E [a; b], 

(A5) {t - 9)M'{t) > 0. 

We are now in position to implement our Robbins-Monro procedure. More precisely, 
denote by vr the projection on the compact set [a; b] defined for all x E [a; b] by 

71" [a;?)] (a^) = Xl{a<x<b} + O-^ixKa} + bl{x>b}- 

Let (7„) be a decreasing sequence of positive real numbers satisfying 



(3.2) 



00 

ra=l 



+ 00 



and 



J2^n< +00. 



n=l 



We estimate the parameter 6 via the projected Robbins-Monro algorithm 



(3.3) 



On+l = 7r[a;fe] ( On — In+lT, 



-n+1 



where the deterministic initial value 9q E [a; b] and the random variable T„+i is 
defined by 

(3.4) T„+i = -2dip^l (X„+0 [sn+i - (X„+i)) • 

Our results of convergence for the estimator On are as follows. 



Theorem 3.1. Assume (Al) to (A5), with 9 E \a]b[ where a < b. Then, 6n 
converges almost surely to 9. 

In order to get a control on the rate of convergence of 9n towards 9, we need to 
assume the following slightly stronger condition of regularity on the deformation 
functions. 
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(A6) For all x G I2, '^t ^(^) twice differentiable with respect to t E Q 

52, ,-1 . xl2 



and for all compact i? in 0, E 



sup \ d ^p^ oipQ {e) 
teB 



< +00. 



Lemma 3.2. Assume (Al) to (A6). Then M is twice continuously differentiable 
on 0. 

Then we can compute the second differential of M" of M for all t G G as 

(3.5) M"{t) =2 [ [dif^'^ oifgo F-^ (x)] ^ dx 

Jo 

-2 9Vr^ °fe° F-\x) {F-\x) - (ft'^ oifgo F~\x)) dx 
Jo 

that is 

(3.6) M"it) =2E [{d^;\X)Y] - 2E [d'^i\X) {e - ^T\X))] . 

For the sake of clarity, we shall make use of 7„ = 1/n for the following theorem. 



Theorem 3.2. Assume (Al) to (A6), with 6 G ]a',b[ where a < b. In addition, 
suppose that M"{6) > 1/2 and that there exists a > 4 such that for all compact B 
in Q, 

E sup|9(y9j""^ o (^g(e)l" < +00. 
.te-B 

Then, we have as n goes to infinity, the degenerated asymptotic normality 



(3.7) 

Moreover, if for all t G [a; b], 
(AT) 

then for all n > 0, 
(3.8) E 
where 



n [On ~e)-^ 60. 

M"(t) > 1/2, 



6„ — 6 



< Oo-e 



2 exp (CiTrVe) 



n + 1 



(3.9) 



Ci = 4E 



sup \d(pt'^ o (pg{e) 

te\a;b] 



Proof. The proofs are postponed to Section 6. 
Remark 3.1. One can observe that 

M"{e) = 2 [ [d^Q^o^eoF-\x)Ydx = 2E\{d^()\X)y 



□ 
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Hence the inequality M"{6) > holds in the general case. Moreover, replacing M 
by AM where X is a real and positive number does not change any results. Then, 
the condition M"{t) > 1/2 may be verified with little modifications. 

Remark 3.2. From a theoretical point of view, it could be interesting to obtain 



a non- degenerated asymptotic normality than the one obtained in (3.7). For that 



purpose, one consider a slight modification of the algorithm defined by (3.3). More 



precisely, it consists in replacing the algorithm (|3.3|) by its "excited" version 
(3.10) 



n+l 



VTr, 



[a;b] ( On — In+lT, 



n+l 



where the initial deterministic value Oq G [a; h] and the random variable T„+i is 
defined by 



(3.11) 



T 



n+l 



2d^~^ (X„+i) isn+l - (^n+l) ) + K+1 



where (V^) is a sequence of independent and identically distributed simulated ran- 
dom variables with mean and variance o"^ > 0. Then, thanks to this persistent 
excitation. Theorem 



by 

(3.12) 



3.1 



and Theorem 



3.2 



are still true for On where (3.7) is replaced 



C 



A/" 0, 



2M"{e) - 1 



4. ESTIMATION OF THE DENSITY 



In this section, we suppose that the random variable e has a density / and we 
focus on the non-parametric estimation of this density. A natural way to estimate 
/ is to consider the recursive Parzen- Rosenblatt estimator defined for all x G /i, by 



(4,i; 



where i^' is a standard kernel function. It is well known that /„ is a really good 
approximation of / for large values of n. However, for small samples corresponding 
to small values of n, may not be a good estimator of /. Hence, it could be 
interesting to have more realizations of e in order to get a better approximation. In 
our case, we know that Zn{9) = En- Then, the idea is to use the prior estimation of 6 
in order to construct a Parzen-Rosenblatt estimator of / which will be of length 2n. 



Further assumptions must to be added to hypothesis (Al) to (A6). More precisely, 



if we denote by d the differential operator with respect to t and d the differential 
operator with respect to x, we need the following hypothesis on the regularity of / 
and on the deformation functions ipf 



(ADl) / is bounded, twice continuously differentiable on Ji, 

with bounded derivatives. 
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(AD2) For all t E Q,(pt is three times continuously differentiable on Ji. 

(ADS) ifg^ is three times continuously differentiable on I2, 

with bounded derivatives. 



(AD4) 



dip, d'^(f, d^ip are bounded. 



Denote by i^' a positive kernel which is a symmetric, integrable and bounded 
function, such that 



K{u)du 



lim |x| K{x) = 0, and / u^K{u)du < +00. 

\x\-^+oo JjJ 



Then we consider the following recursive estimate 



(4.2) 



fn{x) 




hi 



where 6i_i is given by (3.3) and where the bandwidth (/i„) is a sequence of positive 
real numbers, decreasing to zero, such that nhn tends to infinity when n goes to 
infinity. For sake of simplicity, we make use of /i„ = ^ with < a < 1. The 

following result deals with the pointwise almost sure convergence of fn{x). 



Theorem 4.1. Assume (Al) to (A5) with 6 G ]a;b[ where a < b and (ADl) to 
dADi) ). Then for all x e h, 



(4.3) 



fn{x) 



a.s. 



It follows from Theorem 4.1 that for small values of n, the averaged estimator 

1 



fn 



fn fn 



where /„ and fn are given by (4.1) and (4.2), will perform better than fn or /„. 

The second result of this section concerns the convergence in quadratic mean of 
fn{x) to f{x). In this way, we need to add to hypothesis (ADl) to (ADl) the 
following little stronger assumption on the regularity of the deformation functions 
ip. 



(ADS) is twice continuously differentiable on G x Ji 

and dipt{x) , ddipt{x) are bounded with respect to t. 
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Theorem 4.2. Assume (Al) to (A7) with 9 E ]a;b[ where a < b and (ADl) to 



(ADS). Then, for all x E h, 
(4.4) E 



fn{x) - f{x) 



^0. 



Proof. The proofs are postponed to Section 7. 



5. SIMULATIONS 



□ 



This section is devoted to the numerical illustration of the asymptotic properties of 



our estimator On defined by (3.3). Note that for the model (1.1 ), the transformations 



(fiQ which are inversible with respect to 9 have no great interest because, in this 
case, it is possible to express 9 in terms of Xq, . . . , X„, eo? • • • ? ^n- However, when 
ifQ is not invertible with respect to ^, it is not possible to use a direct expression 
for the estimator and our procedure is useful in order to estimate 9. Among the 
many transformations of interest, we focus here on two of them that are used in 
econometry. More precisely, we illustrate our estimation procedure for the Box-Cox 
transformation Lp\ and the arcsinh transformation Lpj. The transformation ip\ is 
given, for all x E M;^, by 



(5.1) 



X* — 1 
t 

log(x) 



if 



if t 



t ^ 




whereas ip'f is given for all x G M, by 



(5.2) 



sinh"^(tx) if t^O 



X 



if t = 0. 



Throughout this section, we suppose that ^ > 0, and specifically we assume that 9 E 
]a; b[ with a = 1/10 and b = 2. Then, the Box-Cox transform cpj is invertible from 
]1; +oo[ to R^ and the arcsinh transformation is invertible from M to M. Moreover, 
the inverses (v?^) ^ and ((^f) ^ of Lfl and (^'^ are given by 



(5.3) 
and 

(5.4) 



Vx E 



Vx E 



{^iy\x) = {l + txf' 



iVt) ^ (x) = 7sinh(tx). 



Hence, it is clear that for all t E [a;b], (ipj) ^ (x) and {(pf) ^ (x) are continuously 
differentiable according to t and that 



(5.5) 

and 

(5.6) 



Vx E 



Vx E 



X 



sinh(tx) — X cosh(tx) 



t \t 



10 



PHILIPPE FRAYSSE, HELENE LESCORNEL, AND JEAN-MICHEL LOUBES 



Denote by M^, respectively M^, the function M given by (2.2) associated with ip\ 
and ipl- For the simulations, we choose 6 = 1. The functions and are 
represented in Figure [Tj One can see that 6 is effectively a global minimum of 
and M^. For the estimation of 6 in both models, one chooses (e^) a sequence of 


































/ 
















i-- 



















































0.0 0.2 



Figure 1 . The functions and 

independent random variables whose distribution is uniform on [0; 1] and (e^) a 
sequence of independent random variables whose distribution is uniform on [1;2]. 
We simulate random variables and according to the model (1.1) 



1, 2 and for the choice of step j„ 



for i = 1,2. Then, for i 
sequence O^^ according to (3.3). More precisely, 

where 



1/n, we compute the 



+1 



T 



n+l 



-29 ^: 



+1) 



-n+l 



-1 



and where (y^^. ) ^ are given by (5.3) and (5.4) and 9(^9^. ) ^ are given by (5.5) and 



(5.6). The values of 9\ are computed until n = 1000. We represent on the left- 
hand side (respectively on the right-hand side) of Figure [2] the difference between 
6*^ and 6 (respectively 6*^ and ^) for 1 < -n, < 1000. In particular, we obtain that 

1^1000 ~ ^\ ~ 0.00239 and l^^ggo ~ ^1 = 0.0042 showing that our procedure performs 
very well for both models. In addition, on the left-hand side of Figure [3} one 
have represented the degenerated asymptotic normality given by (3.7) for the data 



generated according to the model (1.1) associated with (p^. For that, we have made 
200 realizations of the random variable vTOOO ( ^looo ~ ■ Finally, one also consider 



the excited version (3.10) of algorithm (3.3) for the first deformation (fl 



VTr, 



[a.b] ( - InTl 



n+l 
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with 



where the sequence (Vn) is a sequence of independent random variables simulated 
according to the law A^(0, 1/2). As for the degenerated asymptotic normality, one 

have made 200 realizations of the random variable vTOOO ( ^iqoo ^ ^ ) in order to 



illustrate the asymptotic normality given by (3.12). This last numerical result is 
represented on the right-hand side of Figure [3j 
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Figure 2. Difference 9^ - 6 and 6^ - 6. 
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Figure 3. Asymptotic normalities of ^/n (6'^ — Oj and ^/n (o^^ — O). 



6. PROOFS OF THE PARAMETRIC RESULTS 



6.1. Proof of Lemma 3.1 First, (A4) obviously implies that for all compact B 
in e. 



E 



sup Idipt^ o {e) 



< +CX). 
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Moreover, we already saw that the quantile function associated with the distribution 
of e is . Consequently, 

(6.1) 



E 



sup \^ip^'^ o iff) {e) 



I sup 1 9^9^ ^ o ^{x)^^ dx < +00. 

Jo t£B 



Now, it follows from (A2) that for all x G I2, 
(6.2) 

d [{F-\x) - i^t^ o o F-\x)f\ = -1d^-^ [^e o F-^{x)) {F-\x) - o o F' 

is a continuous function with respect to t. In addition, if S is a compact set con- 
taining 6', it follows from (A2) together with the mean value Theorem that there 
exists a constant Cb > such that 

(6.3) sup \F-\x) - i^^^ o {F-\x)) \ < Cb sup \d^^^ o ^g {F-\x)) \ . 



t&B 



Hence, we deduce from (|6.2|) and the previous inequality that 

,2' ■ 



sup 

teB 



d 



{F-\x) - ip;' o ipg o F-\x)y < 2Cb sup o {F'\x)) 

-I t&B 



which implies by (6.1) that 

sup d 

teB 



{F-\x)~^-^o^,oF-\x))' 



is integrable with respect to x. Finally, M is continuously differentiable on and 
for all t G 6, 

M\t)= [ -2d'p;^ {<pe o F-\x)) {F-\x)-^;^o^goF-\x))dx. 
Jo 

□ 



6.2. Proof of Lemma 3.2 Hypothesis (A6) implies that 
(6.4) - 2dip;' {ife o F-\x)) {F~\x) - if^' o cpg o F'\x)) 

is continuously differentiable with respect to t. In addition, we have 
d [dip-' {pe o F-\x)) {F~\x) - o cpe o F-\x))] 
= - [dp;' o pg o F-\x)Y + d'pi' o PS o F-\x) {F-\x) - Pi' o p, o F-\x)) . 
It follows from (6.3) that for every compact set B containing t and 6, 



\d'p;' o p, o F-\x) {F-\x) - p^' o p, o F-\x)) 



^ Cssnpld'^p^' o pq o F '{x)\snp\^p^'opgoF '{xj 



teB 



teB 



Then, (A6) and (6.1) together with the Cauchy Schwartz inequality imply that 

d\;' o pe o F-\x) {F~\x) - p^' o pe o F-\x)) 
is integrable with respect to x. Hence, we have 

/ sup 1 9 [dp-;' [pe o F-'{x)) {F^'ix) - p-^' o pg o F-'{x))] \dx < +00 
Jo teB 
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which enables us to conclude that M is twice continuously differentiable on and 
for all t e 6, 

M"{t) = 2 [ [d^;^ o o F-\x)Y dx 
Jo 

-2 9Vr^ °Ve° F-\x) {F^\x) - oifgo F"\x)) dx. 
Jo 

□ 



6.3. Proof of Theorem 3.1 Denote by the a-algebra of the events occurring 



up to time n, J^n = o'i^o, ■ ■ ■ j^n)- First of all, we shall calculate the two first 
conditional moments of the random variable T„ given by (3.4). On the one hand, 
one has 



E[T„+i|j;] 



-2E 
-2E 



d(p^'^ O ipg{en+l) (en+1 - (p^^ O ipeiSn+l) ) \J^n 



Moreover, as En+i is independent of and 9n G one can deduce from (3.1 ) that 

= -2 ^ d^z^ o o F-\x) (f-'(x) - o ^g o F~\x)'^ dx 

= M'iOn) a.s. 
which immediately leads to 
(6.5) E[T„+i|J-„] = M'(^„) 

On the other hand. 



a.s. 



E 



J.2 IT- 



(6.6) 



4E 
4E 



-1 



Moreover, it follows from the mean value Theorem that 

(6.7) \^g' {Xn+l) - (Xn+l) I < sup \d^;\Xn+i)\ X % - 9\. 



Consequently, the conjunction of (6.6) and (6.7) leads to 

2 



(6.^ 



E 



rpz -p 



< 4 ^„ - ^ E 



sup \d^-^\X)\- 

■tG[a;6] 



Hence, there exists a positive constant C\ given by (3.9) such that 

(6.9) E[r2^i|j-J <cAen- 



a.s. 
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Furthermore, for all > 0, let K 

Vn+l = 



On — 6 
2 



We clearly have 



On+l — 
7r[a;fe] I 



n+1 



as we have assumed that 9 belongs to ]a\h[. Since T^ya-b] is a Lipschitz function with 
Lipschitz constant 1, we obtain that 

Vn+l < {dn — 7n+l^n+l ~ ^) ) 

< K + 7Li^„\i-27n+iT„+i(^„-^). 



Hence, it follows from (6.5) together with (6.9) that 

(6.10) E[K+i|-Fn] < K(l + Ci7'+i) - 27n+i(^„ - e)M\en) a.s. 



In addition, as 6'„ G ( |A5[ ) implies that {On — 9)M'{9n) > 0. Then, we deduce 

from (6.10) together with Robbins-Siegmund Theorem, see Duflo [7j page 18, that 
the sequence (V„) converges a.s. to a finite random variable V and 



(6.11) 



n=l 



+iy<yn 



)M'( 



< +00 



a.s. 



Assume by contradiction that V ^ a.s. Then, one can find two constants c and d 
such that 

< c< d < 2max (|a|, |6|) , 

and for n large enough, the event {c < \6n — 6\ < d} is not negligible. However, on 
this annulus, one can also find some constant e > such that (6 



which, by (6.11), implies that 



e)M'{en) > e 



^1n< +00. 



n=l 



This is of course in contradiction with assumption (3.2). Consequently, we obtain 
that V = a.s. leading to the almost sure convergence of 0„ to 6. □ 



6.4. Proof of Theorem 3.2, Our goal is to apply Theorem 2.1 of Kushner and 
Yin [13\ page 330. First of all, as 7„ = 1/n, the conditions on the decreasing 
step is satisfied. Moreover, we already saw that 6n converges almost surely to 6. 
Consequently, all the loca l as sumptions of Theorem 2.1 of [13] are satisfied. In 
addition, it follows from (6.5) that E [T^+ilJ-'n] = M'{6n) a.s. and the function 
M is two times continuously differentiable. Hence, M{6) = 0, M'{9) = and 
M"{9) > 1/2. Furthermore, it follows from (6.9) and the almost sure convergence 
of On to 9 that 

lim E [T^, Jj: 1 =0 a.s. 
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Finally, Theorem 4.1 of [13j page 341 ensures that the sequence {Wn) given by 

= v^(^„ - e) 

is tight. Then, one shall deduce from Theorem 2.1 of jT3] that 



5n 



Moreover, taking expectation on both sides of (6.10) leads, for all n > 0, to 

(6.12) Vn+l < Vn{l + Ci7^+i) - 27n+lE [{On - ^)M'(^„) 

where 



In addition, as M'{9) = 0, one have 



e„-0 



(6.13) 



M'i 



'nl V " 



M" {d + x{dn - 9))dx a.s. 



Consequently, it follows from (6.12) and (6.13) that 



~n-of / M"{e + x{en-e))dx 



(6.14) Vn+1 < Vn{l + C^i7'+i) - 27n+iE 



Finally, since 9 G ]a; h[ and On G [a; 6], + x{9n — 6) E [a; 6] for all x G [0; 1]. Then, 
as we have supposed that M"{t) > 1/2 for all t G [a; 6], we can write that 

•1 

M"{e + x{en-e))dx> 1/2. 



Then, we find from (6.14) that for all n > 0, 

(6.15) Vn+l < Vn{l + Ci7^_j_i - 7n+l)- 

Moreover, the standard convex inequality given for all a; G M, by 

1 — X < exp(— x) 

implies that 

(6.16) Vn+i < Vn exp (Ci7^+i - 7n+l) • 
An immediate recurrence in ( 6.16[ ) leads to 



(6.17) 



Vn < voYl exp (Ci7fc - 7fc) , 

k=l 

(n n \ 

k=l k=l / 

/ +00 n \ 

< Wo exp Ci ^ 7^ - ^ 7fc . 



fc=l k=l 
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As 7fc = 1/A;, it follows immediately from (6.17) together with 

+00 



E 

k=i 



~6 



and 



that, for all n > 0, 



^7fc > log(^ + 1) 
exp (CiTrVe) 



k=l 



Vn < Vq- 



n + 1 



which achieves the proof of Theorem 3.2 



□ 



7. PROOFS OF THE NONPARAMETRIC RESULTS 



Recall that / is the density of e and denote by /* the density of Z{t). As the 
distribution of Z(t) is F o Lp'-^ o ^p^^ we have for all x E Ii, 

f{x) = f (v^e ^ o v9t(a;)) d [ipg^ o y?*] {x). 

We can note that f^ = f. We start by stating some facts about the densities f^{x) 
which will be used hereinafter. Firstly, we have 

fix) = f {ifg^ o ift{x)) d [pq^ o i^t] (x) 

= / [Ve^ o Vt{x)) d(pt{x)d [ifg^] iMx)) ■ 

Hence, the hypothesis (ADl), (AD2) and (ADS) implies that /* is twice continuously 
differentiable with respect to x. Moreover, for all x E Ii, 

df{x) = f ((^g ^ o <^t{x)) d^ [p-'^ o ipt] (x) + /' [ip^^ o ip^{x)) [d [p^^ o ip^] {x)f 
and 

d'f\x) =f {^^' O d' [if,' o (x) 

+ 3/' {Ve^ o Vt{x)) d [<pg'^ o {x)d^ [<pg^ o <pt\ {x) 

+ f" {Ve^ o M^)) {d [fe^ o ft] {x)Y . 

Hence, it follows from (ADl) to (AD4) that f^{x), df^{x) and d'^f^{x) are bounded 
on G X Ji. Secondly, (ADS) implies that f^{x) is also continuously differentiable 
with respect to (t, x) and we have for all t G and for all x E Ii, 

df{x) = f {ipg^ o ipt{x)) dd [(p^^ o ipt] {x)+f' {ipg^ o ipt{x)) d [pg'^ o ipt] {x)d [ip^^ o ipt] 

where 

d [ipg^ o <pt] (x) = dftix)d [pg^] {(ft{x)) , 

and 

dd [ipg^ o Lpt] (x) = ddipt{x)d [ipg^] {ft{x)) + dipt{x)dipt{x)d^ [p^^] {ft{x)) ■ 

Hence, under ( |AD4[ ) and ( |AD5[ ) 

(7.1) sup|a/*(x)| < +00. 

tee 
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7.1. Proof of Theorem 



4.1 



Recall that = cr{eo, • • • , ^n} and note that 6'„_i is 
measurable with respect to J^n-i- Denote, for all a; G /i, 



hr 



hr 



Then, we have the decomposition for all x G /i, 

nfnix) = Mn{x)+Nn{x), 

where 



(7.2) 

and 
(7.3) 



i-1 



i=l 



Nn{x) = J2 iW^ix) - E [H^,(X)| J-,_i]) . 



i=l 



On the one hand, for a fixed 6n-i, recall that /^"~^ denotes the density of Zn{6n-i)- 
Then, with the changes of variables v = we have that 



E[Wi{x)\J^i^, 



1 ^/ X — u 



hi 



-K 



hi 



f'-'{u)du 



K{v)f^-'{x-hiv)dv. 



Hence, 



Moreover, we already saw that /* is twice continuously differentiable. Thus, for all 
t 6 0, there exists a real Zi = x — vhiy, with < y < 1, such that 



E[W,{x)\J^i.,]-fi-^{x)= / (fi-^{x-vh,)-fi-^ix))K{v)dv. 



(7.4) 



fix - vhi) - fix) = -vhdfix) + ^^^d^fiZi). 



Using the parity of K and preliminary remarks on c?^/*, we obtain that 

{fix-vh.i)- fix))Kiv)dv = 
which implies that 



^d'fiz,)Kiv)dv 



sup 

tee 



[fix-vK)- fix))Kiv)dv 



^ ^ sup \d'fiz)\ / v'Kiv)dv. 



teB,zeii 



Consequently, there exists C2 > such that 



(7.5) 



E[H^,(x)|J-,_i]-/^-Hx) 



^ C hi 
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Moreover, since /* is a continuous function with respect to t, and 6'„ converges to 9 
almost surely, we have for all x G /i. 



(7.6) 



f"-^{x) f{x) a.s. 



Consequently, Cesaro's Theorem with (7.5) imply that 

1 



(7.7) 



n 



-Mn{x) > f{x) a.s. 



On the other hand, since K is bounded, {Nn{x)) is a square integrable martingale 
whose predictable quadratic variation is given by 



<iV(x)>. = 5^E[iV2(x)|J-._i] -iVti(x), 

i=l 
n 



i-i 



i=l 



Moreover, we also have 



E[W^{x)\J^,_,] 



1 

hi 



K\v)f^-'{x-h,v)dv. 



However, ( |7.4[ ) together with the regularity of /*(x) and the parity of K imply that 



sup 

tee 



[f\x-vK)-f\x))K\v)dv 



<: sup \d'f{z)\ I v'K\v)dv. 
2 iee,2e/i 



Consequently, there exists C3 > such that 



(7.^ 



E[W^2(a:)|J-._i]-^/-^ 



[x] 



where z/^ = j^K'^{u)du. It also follows from (7.6) and Toeplitz Lemma that 

1 " 1 - 
lim 1 V -f^-' (x) = f{x) a.s. 

In addition, we deduce from the elementary equivalence 



n 



i=l 



a + l 



that 



lim 

n^oo 77, 



l+a 



i=l 



a + l 



Finally, (7.8) leads to 

(7.9) lim ^Y.E\^l{x)\7,_,\ 



a + l 



f{x) a.s. 



/(x) a.s. 
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Moreover, (7.5) together with (|7.6|) and Cesaro's Theorem imply that 
(7.10) 



1 " 

hm -Ve^ [W,ix)\T,^,] = f 

n— s>oo n '—^ 



i=l 



Then, as a > 0, we can conclude from (7.9) and (7.10) that 

hm ^— = a.s. 



n^oo n 



1+a 



a + V 



Consequently, we obtain from the strong law of large numbers for martingales given 
e.g. by Theorem 1.3.15 of [7j that for any 7 > 0, (A^„(x))^ = o (71^^°" (log(n))^^'*^) 
a.s. which ensures that for all x G /i. 



(7.11) 



n 



-NJx) 



-7- a.s. 



Finally, combining (7.7) and (7.11), one obtain that for all x G /i, 
(7.12) /(a;) a.s. 

ending the proof of Theorem |4.1 



7.2. Proof of Theorem |4.2[ Our aim is now to show that for all x G /i, 

2" 



E 



fn (x) - f{x) 



^0. 



It follows from the classical decomposition bias-variance that 



E 



fn {x) - f{x) 



Bn{x) 



(7.13) 

where 
(7.14) 
and 

(7.15) V^{x) = E 

Firstly, we can write 



Bn{x) + Vn{x) 



E 



fn{x) - fix) 



fn{x)-E UX) 



E 



fn{x) - fix) 



-J2E[W.{x)-f{, 

i=l 
1 " 

-^E[E [PV,(x)|J-,_i]-/(5 



i=l 



In addition, (7.5) implies that 
(7.16) E 



E[WM\J'n-i]-f'-'{x) 



^0 



□ 
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It also follows from the boundeness of / "^^ (x) and ( 7.6 ) together with the dominated 
convergence Theorem that 



(7.17) 



E 



^0. 



Hence, we deduce from (7.16) and (7.17) that 

E[E[H^,(x)|J-„_i]-/( 
which implies by Cesaro's Theorem that 



X) I > 0, 



E 



leading to 
(7.18) 



fn{x) - f{x) 



Bnix) > 0. 



^0 



Secondly, we focus on the variance term Vn{x). For all 1 < z < n and for all x G /i, 
denote by Ui{x) the sequence 

(7.19) Ui{x) = W,{x)-E[W,{x)]. 

Then, we have the decomposition 



(7.20) yn{x) = -j2E[u,{xy] + - Yl ^m^wax)] 

li i < j, we have 



i=l 



i=l,i<j 



E [U^{x)Ujix)\Tj-i] = Ui{x)E [Uj{x)\T, 



i-iJ 



< 2C2hl 



In addition, (7.5) implies that 

E [f/,(x)|J-,_i] - fi-^{x) + E [fi-^{x) 
Hence, we obtain that 

-2C2h] |f/,(x)| < E [Uiix)U,ix)\Tj-i]-Uiix)fi''ix)+Uiix)E [fi-'ix)\ < 2C2h] \U,{x)\ . 
Thus, taking expectation in the previous inequality leads to 

-2C2h'p. [\U,{x)\] < E [Ui{x)UAx)]-E [?7i(x)/^-i(x)] +E [Ui{x)] E [/^"^(x)] < 2C2/1JE [\Ui{x)\] . 
Finally, we obtain that 

\E[Ui{x)Uj{x)]\ < |e ^i{x)f^-'{x)\ -E[f/,(x)]E [/^-Hx)] \+ 2C2h]E[\Ui{x)\]. 

Moreover, we have the following equality 
(7.22) 

E fo(x)/-^(x)l-E[f/,(x)]E [/-^(x)l = E fc(x) f/-^(x) - /(x))! + f/(x) - E [/-^(x)!) E [f/,(x)] . 
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Consequently, (7.21) and (7.22) together with Cauchy-Schwartz's inequahty imply 
that 



(7.23) E [\Ui{x)Uj{x)\] <2y/E [Ui{x)^] Je 



f^-^{x) - f{x) 



+ C2h] 



The definition (7.19) of Ui{x) also leads to 

E[U^{x)\ <E[Wf{x)\ 



which implies by (7.8|) that 
(7.24) 



E [f/f (a;)] < /-(x) 



From now, denote by C a constant which does not depend on n. On the one hand, 

C 



recall that (3.8) implies that for all n > 0. 

(7.25) E e„-e 



< 



n 



On the other hand, using the regularity of /, we obtain that for all x E Ii, 

\f{x)-f{x)\<sn^\df\x)\\t-e\. 
iee 



Hence, (7.1) and (7.25) lead to 
(7.26) WE 



< 



C_ 



Then, the conjunction of (7.23), (7.24) and (7.26) implies that 



(7.27) E[|t/,(x)f/,(x)|]<2| ^/-E 



/^-i(x) 

Finally, using the boundedness of p[x\ we obtain that 



+ Ca/i. 1 ( ^ + C^K] 1 . 



1 h^- 

(7.28) E[|f/,(x)f/,(x)|] <C( ' 



Moreover, if /i„ = l/n°', one have 
and 



i=l,i<j 



3=2 i=l 



j=2 



a/2+1 



/2 



3 + a 

< n 2 
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Consequently, one deduce from the two elementary previous calculations and from 



(7.28) that 



1 " 

(7.29) ^ ^ ^ [\Ui{x)Uj{x)\] < C (n=^ + n'^f ) 



n . ^ . . 



which tends to as n goes to infinity, as < a < 1. In addition, thanks to (7.24) 
and the boundeness of /*, we have 



(7,30) SC^E^SC^^SP"-'"" 

1=1 1=1 * 



which tends to as n goes to infinity, as a < 1. Hence, (7.20), (7.29) together with 
(7.30) let us to conclude that for all x G /i. 



(7.31) Vn{x) 0. 



Finally, (7.13), (7.18) and (7.31) let us to achieve the proof of Theorem 4.2 □ 
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